A non-parametric method to estimate the number of clusters

نویسندگان

  • André Fujita
  • Daniel Y. Takahashi
  • Alexandre Galvão Patriota
چکیده

An important and yet unsolved problem in unsupervised data clustering is how to determine the number of clusters. The proposed slope statistic is a non-parametric and data driven approach for estimating the number of clusters in a dataset. This technique uses the output of any clustering algorithm and identifies the maximum number of groups that breaks down the structure of the dataset. Intensive Monte Carlo simulation studies show that the slope statistic outperforms (for the considered examples) some popular methods that have been proposed in the literature. Applications in graph clustering, in iris and breast cancer datasets are shown. © 2013 Elsevier B.V. All rights reserved.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Bootstrap Interval Robust Data Envelopment Analysis for Estimate Efficiency and Ranking Hospitals

Data envelopment analysis (DEA) is one of non-parametric methods for evaluating efficiency of each unit. Limited resources in healthcare economy is the main reason in measuring efficiency of hospitals. In this study, a bootstrap interval data envelopment analysis (BIRDEA) is proposed for measuring the efficiency of hospitals affiliated with the Hamedan University of Medical Sciences. The propos...

متن کامل

Evaluating the efficiency of Iranian industrial universities based on non-parametric and parametric approaches

The present study is the efficiency of Iranian industrial universities using non-parametric methods of data envelopment analysis and random border analysis parameter for input variables (number of incoming students, number of faculty members, number of staff and budget) and output (specific income, Has evaluated the number of students studying, the number of graduates and conference papers) and...

متن کامل

روش‌های پارامتری و ناپارامتری برای برآورد الگوی سن یائسگی طبیعی با استفاده از اطلاعات شیوع

Abstract Background: Various studies have shown the age changes of menopause in Iran, in a relatively wide area, approximately from 46 to 51 years. This broad range of the changes can attribute not only to the essential difference between women of different regions, but could also be related to methodological problems. Because the model of menopause age could be estimated by cohort or a cros...

متن کامل

Evaluation of grain yield stability of lentil genotypes using non-parametric methods

The challenge of the interaction of genotype × environment is one of the main issues in plant breeding. Various statistical methods to estimate the interaction of genotype × environment and choice the stable and productive genotype(s) have been introduced. In this study, 14 lentil genotypes along with two controls (Sepehr and Gachsaran cultivars) were evaluated during four growing seasons (2016...

متن کامل

Investigation of Trend of Precipitation Variation Using Non-Parametric Methods in Charmahal O Bakhtiari Province

Climatic parameters in time and space scales of change are for many reasons of Changes and how they should be based on observations using a statistical method to be determined. Analysis of the most widely used statistical methods that assess potential climate change on hydrological time series, such series of precipitation, temperature and flow rate used. This study of 11 synoptic,rain gage and...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Computational Statistics & Data Analysis

دوره 73  شماره 

صفحات  -

تاریخ انتشار 2014